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Abstract . The Global Modeling Initiative has integrated two 35-year simulations of an ozone 
recovery scenario with an offline chemistry and transport model using two different 
meteorological inputs. Physically based diagnostics, derived from satellite and aircraft data sets, 
are described and then used to evaluate the realism of temperature and transport processes in the 
simulations. Processes evaluated include barrier formation in the subtropics and polar regions, 
and extratropical wave-driven transport. Some diagnostics are especially relevant to simulation 
of lower stratospheric ozone, but most are applicable to any stratospheric simulation. 

The temperature evaluation, which is relevant to gas phase chemical reactions, showed that 
both sets of meteorological fields have near climatological values at all latitudes and seasons at 
30 hPa and below. Both simulations showed weakness in upper stratospheric wave driving. The 
simulation using input from a general circulation model (GMIgcm) showed a very good residual 
circulation in the tropics and northern hemisphere. The simulation with input from a data 
assimilation system (GMIdas) performed better in the midlatitudes than at high latitudes. Neither 
simulation forms a realistic barrier at the vortex edge, leading to uncertainty in the fate of ozone- 
depleted vortex air. Overall, tracer transport in the offline GMIgcm has greater fidelity 
throughout the stratosphere than the GMIdas- 
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1. Introduction 


For the past few decades, chemistry and transport models have been used to assess the 
impact of natural and anthropogenic perturbations such as aircraft emissions or 
chlorofluorocarbon growth on stratospheric ozone. Most assessments relied on two-dimensional 
(zonally averaged) models that cannot physically represent inherently 3D processes such as 
transport out of the polar vortex and cross tropopause transport [Park et al, 1999]. Some recent 
efforts have used three-dimensional chemistry and transport models (CTMs), which provide 
more realistic representations of non-zonal processes [Danilin et al., 1998; Douglass et al., 1999; 
Kinnison et al., 2001], although the third dimension greatly increases the computational 
requirements and demands greater human resources for evaluation. In spite of this, development 
of a 3D assessment model is a worthwhile goal because it offers the opportunity to improve the 
physical basis of assessment modeling and, if the 3D model compares well against observations, 
reduce uncertainties due to transport. 

The Global Modeling Initiative (GMI) was formed in 1995 with the goal of producing a 
well-tested 3D chemistry and transport model that could be used for assessments and other 
controlled experiments that required a common framework. In their first experiment, the GMI 
science team used the same offline chemistry and transport model [Rotman et al., 2001] with 3 
different sets of meteorological input to evaluate which input would provide the most realistic 
simulation of an emissions scenario [Kinnison et al., 2001]. To establish which simulation was 
the most credible, physically based tests, derived from aircraft and satellite data sets, were used 
to evaluate the simulations [Douglass et al., 1999]. Six tests were created, each evaluating a 
different aspect of stratospheric transport and mixing, and grading standards were defined by the 
observations and their uncertainties. The simulations received scores on each test that could then 
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be used to quantitatively distinguish between them. At the end of the evaluation, the GMI 
science team could objectively select the best meteorological data set for simulating the effects 
of supersonic aircraft on the stratosphere. 

Recently, the GMI science team ran two 35-year integrations of the WMO scenario MA2 
[WMO, 2002] with the GMI-CTM. During the period simulated, 1995-2030, the scenario’s 
organic chlorine and bromine boundary conditions decline while N2O and CH4 increase. The 
GMI model, input meteorological fields, and the scenario simulated are described in detail in 
Connell and Douglass [2003]. While the intent of this WMO scenario is to predict future ozone 
change, the primary purpose of this GMI study is to assess the sensitivity of model predictions to 
differences in transport. We chose two inputs for these simulations, one from the Finite Volume 
General Circulation Model (FVGCM) and the other from the Finite Volume Data Assimilation 
System (FVDAS). The CTM calculations used 1-year of meteorological fields from each model, 
repeating them for the 35-year simulation; the CTM simulations will be referred to as GMIgcm 
and GMIdas- These data sets were selected because although they have significant differences in 
residual circulation and mixing [Schoeberl et al., 2003], each is also known to realistically 
represent some aspects of the stratosphere. Initial evaluations using the GMI grading criteria on 
Goddard CTM simulations using the FVGCM and FVDAS winds showed that both data sets 
have stratospheric transport characteristics superior to the previously tested GMI simulations. 

In this paper we build on the observationally based model evaluation philosophy discussed 
in Douglass et al. [1999]. The emphasis here is not just ‘comparison with data’, but 1) 
identification of atmospheric processes relevant to the realistic simulation of a phenomenon or 
feature, and 2) identification of a data set that demonstrates the occurrence of this process. Such 
a data set becomes the physical basis for a model evaluation test. A model earns credibility when 
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it can be shown to realistically represent known atmospheric processes. Some of the atmospheric 
processes relevant to the WMO MA2 scenario can be tested using previously defined GMI 
evaluation criteria. In this paper, new physically based stratospheric diagnostics are presented 
that illustrate additional stratospheric processes. While some of the tests are especially relevant 
to this scenario with declining halogens, all tests are generally applicable to any stratospheric 
simulation. In the following sections, we identify some important stratospheric processes and 
their diagnostic tests, and apply them to two GMI 3D-CTM simulations in order to evaluate 
many (though not all) aspects affecting their credibility in an ozone recovery scenario. Applying 
these tests to simulations run in a common framework (i.e., the GMI-CTM) allows us to examine 
the sensitivity of the results to the meteorological input. 

2. Evaluating the suitability of GMI simulations for use in ozone predictions 

How do you evaluate a model’s credibility? The GCM-Reality Intercomparison Project for 
SPARC (GRIPS) compared temperature and wind fields in 13 3D middle atmosphere climate 
models against observations to identify deficiencies in dominant atmospheric features, such as 
the location of the jets and polar temperatures [Pawson et al., 2000]. To evaluate an assessment 
model, one might also look at qualitative agreement with historical ozone trends. The Scientific 
Assessment of Ozone Depletion [WMO, 2002] shows many models' simulations of column 
ozone. While many models show qualitative agreement with historical ozone from 1980 to 2000, 
their predictions of future ozone diverge. Such agreement is a misleading diagnostic since the 
total column represents the integrated effects of chemistry and transport at many altitudes. To 
understand the difference in performance between 8 chemistry-climate models, Austin et al. 
[2003] chose several specific diagnostics, such as ozone climatology, polar temperature biases, 
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and poleward heat fluxes. Their intent, commensurate to that of GMI, is to find a range of 
diagnostics relevant to processes influencing ozone and use them to reduce the uncertainty in 
predictions of future ozone levels. 

In this study we assess model credibility by evaluating temperature and transport processes. 
This is done with observationally based tests at a variety of altitudes and latitudes. In GMI-1, we 
developed tests to assess model transport processes in regions that would be perturbed by 
stratospheric aircraft exhaust. Since emissions were projected to occur in the upper troposphere 
and lower stratosphere, transport near the tropopause was especially important and tests were 
developed that emphasized model fidelity there. In this ozone recovery scenario, transport 
fidelity is important at all levels where Cl chemistry changes. For example, in the lower 
stratosphere, we expect changes in PSC processes to affect O3, and in the upper stratosphere, we 
expect ozone loss by gas phase Cl reactions to be important. In this section, new tests are 
presented that expand the scope of the GMI’s stratospheric evaluation. These tests, as well as 
some of the previous GMI tests, are applied to two new GMI-CTM simulations. 

Data sets used in these tests are from the National Center for Environmental Prediction 
(NCEP) /National Center for Atmospheric Research (NCAR) Reanalysis, the Upper Atmosphere 
Research Satellite (UARS), and an ER-2 airborne spectrometer. NCEP/NCAR temperature 
analyses from 1980-1999 are used to create a climatology of monthly temperature distributions 
for 8 stratospheric levels and 11 latitude bands. (See Newman et al. [2001] for details of the 
reanalysis products.) UARS data sets include N2O and CH4 from the Cryogenic Limb Array 
Etalon Spectrometer (CLAES) [Roche et al., 1996] and CH4 from the Halogen Occultation 
Experiment (HALOE) [Park et al., 1996]. Both instruments began operation October, 1991; 
CLAES made measurements for about 18 months and HALOE has operated nearly continuously 
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UARS was launched. CLAES has high spatial density sampling that alternates between 35°S- 
80°N and 35°N-80°S every 35 days. HALOE collects ~30 profiles daily with latitudes ranging 
from 80°S-80°N; latitudes poleward of 60° are only sampled in spring and summer. HALOE data 
sets are especially useful for investigating interannual variability and lower stratospheric 
transport. 

2.1 Generalized tests of stratospheric temperature and transport 

In this section, observationally based tests derived from aircraft and satellite observations 
are used to probe basic aspects of the stratosphere, such as temperature, transport, and mixing 
characteristics. The tests are used to identify strengths and weaknesses in the GMI-CTM 
simulations, but can be sensibly applied to any 3D online or offline simulation. In Section 2.2, 
we present tests that are particularly relevant to assessing the effects of the polar ozone loss. 

2.1.1 Temperature 

Both gas and heterogeneous phase chemical reactions are important in the stratosphere. 
Although both reaction types depend on temperature, they require different tests to evaluate 
model temperature behavior. Gas phase temperature-dependent reactions will proceed at a 
slightly slower or faster rate as temperature varies, but heterogeneous reactions only occur if the 
necessary temperature threshold is reached. (For regions where heterogeneous reactions are 
important, temperature distribution matters more than the mean. This evaluation will be 
discussed in Section 2.2.1.) To judge model temperatures for gas phase chemical reactions, we 
compare how often and how closely model temperatures agree with climatological values. This 
is accomplished by comparing the model and observed most probable temperatures for a given 
month, latitude, and altitude. Other evaluations of climate models have focused on polar 
temperatures or have looked at broadly averaged (i.e., monthly, annual, or global) temperatures 
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[Austin et al., 2003; Pawson et al., 2000]. The temperature diagnostic shown here attempts to 
make a spatially and temporally thorough comparison using a little averaging as possible, whose 
results can be displayed as simply as possible. Model temperatures in the GMI simulations are a 
property of the input meteorological fields (i.e., of the FVGCM and the FVDAS). 

Stratospheric temperature evaluation is based on the climatological mean distribution of 
NCEP/NCAR reanalysis temperatures from 1980-1999. For the entire 20-year data set, daily 
area-weighted temperature distributions are calculated for 1 1 latitude bands and 8 pressure levels 
from 150-1 hPa, and a mean distribution for each month is calculated. Climatological monthly 
mean distributions for each latitude band and pressure level are then determined by averaging 
over the 20 years. The test itself examines the difference between the model and climatological 
most probable temperature (MPT) for each month and latitude band, resulting in 132 points of 
comparison on each of 8 pressure surfaces. Figure 1 shows how the FVGCM and FVDAS MPTs 
differ from the NCEP/NCAR values on the 50 hPa and 5 hPa surfaces. At 50 hPa, both 
simulations do an excellent job of producing climatological temperatures; most differences range 
from 0-3K. At 5 hPa, both simulations are too cool, but the FVDAS is generally about 3K closer 
to the climatological temperatures than the FVGCM. Also, there is no apparent pattern to the 
FVDAS differences while the FVGCM’s worst agreement proceeds from northern spring, 
through the tropics, to southern spring. 

It is important to remember that this is not a comparison of model climatology with NCEP 
climatology. FVDAS temperatures were assimilated for the period July 1999-June 2000 while 
the FVGCM temperatures represent one year of a 35-year GCM simulation. Model temperatures 
cannot be judged as good or bad by this test; rather, this test is a general reality check to ensure 
that neither model year chosen deviates too far from observed climatology. 
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To summarize how each model level compares with observations, an area-weighted 
distribution of the differences between one year of model and climatological most probable 
temperatures are plotted in Figure 2. The FVDAS consistently produces MPTs in better 
agreement with climatology than the FVGCM. For the 8 pressure levels tested, the FVDAS is 
within 3K of climatology 82% of the time, while the FVGCM agrees within 3K 69% of the time. 
The results at 100 hPa are quite interesting because this is the only level with a bimodal 
distribution of differences. Contouring these differences as a function of latitude and time one 
easily sees the reason: both simulations have a bias toward low temperatures near the tropical 
tropopause, while poleward of 30° each model has excellent agreement with climatology. 

2.1.2 Residual Circulation 

Many GCMs cannot produce a quasi-biennial oscillation (QBO), so our tests of model 
circulation emphasize regions outside the tropics and subtropics, away from the secondary 
circulation set up by the QBO [ Jones and Pyle, 1984] . We do wish to evaluate the mean behavior 
of tropical transport, so the test in Section 2 . 1 . 2. 2 includes profile data from both QBO phases, 
and the test in Section 2 . 1.2.3 is a comparison that does not depend on the exact location of the 
subtropical boundary, which varies with the phase of the QBO. 

2.1.2. 1 Annual cycle of CH 4 in the extratropical middle and upper stratosphere 

The transport characteristics of the middle and upper stratosphere are relevant to the ozone 
distribution even though photochemistry strongly influences ozone there. For example, NO x 
family chemistry dominates O 3 loss in the sunlit middle stratosphere, but NO x mixing ratios 
depend strongly on NO y abundance, which is controlled largely by transport. The dynamics and 
composition of the upper stratosphere also affect ozone in the polar lower stratosphere because 
the strength of wave activity aloft determines descent rates and influences the fraction of upper 
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stratospheric air reaching the lower stratosphere by the end of winter [Rosenfield and Schoeberl, 

2002 ]. 

We can evaluate model transport and mixing processes affecting the upper stratosphere 
through analysis of the CH4 annual cycle. In the high latitudes it is a function of seasonally 
varying meridional transport, mixing, and descent. Photochemistry also matters in the summer 
upper stratosphere. The amplitude, phase, and variability of CH4 annual cycle provide useful 
measures of the timing and strength of transport and the presence of photochemical processes. 
HALOE CH4 data from 1992-1999 show large interannual variability equatorward of 44°, 
probably due to QBO influence, so we choose to study only the middle and high latitude ranges, 
44°-56° and 72°-80° which show low interannual variability. CLAES CH4 must be used for this 
test because HALOE does not sample the polar region in fall and winter. 

This test compares several features of the CH4 annual cycle at 800K (-10 hPa) and 1200K 
(-5 hPa). Probability distribution functions (pdfs) calculated from daily 1992 CLAES data 
within each latitude band are contoured together to show the amplitude, phase, and variability of 
the cycle. Figure 3 shows a one-year cycle of contoured pdfs of CH4 from CLAES, the FVGCM, 
and the FVDAS at 1200K for 4 extratropical latitude bands. UARS yaw maneuvers result in 35- 
day gaps poleward of 34° in each hemisphere 5 times a year. Narrow distributions (yellow and 
red) indicate a homogeneous atmosphere, which can result from rapid photochemistry or mixing, 
while broad regions (purple, dark blue) indicate that transport from a photochemically different 
region dominates processes that homogenize the atmosphere. Four diagnostic quantities derived 
from data are described below; the scoring of GMIgcm and GMIdas are summarized in Table 1. 

Most Probable CHd mixing ratios . We compare the most probable value rather than the 
mean because it is not affected by the spatial distribution of the tracer and thus is more accurate 
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measure of a region’s composition [Sparling, 2000]. In the real atmosphere, tracer mixing ratios 
depends on various transport processes as well as chemistiy. In a model, particularly in the upper 
stratosphere, tracer mixing ratios also depend on the height of the model lid. For example, if the 
lid is near the stratopause, the model will not have a mesospheric source of low CH 4 to descend 
into the polar stratosphere in winter [Ma and Waugh, 2003] . While this test assesses whether the 
general balance between transport and chemistry is right, the results may also reflect the 
implementation of the model. 

At each latitude band and height tested, the model receives a point for being within 25% of 
the observed value for the entire year, 0.5 point for being about 25% from the mean, and nothing 
for being more than 25% from the mean. In general, GMIgcm mixing ratios are much lower than 
CLAES at 1200K, but show better agreement at 800K, especially in the NH. The GMIdas most 
probable values are usually quite close to the CLAES values and the total score for the GMIdas 
was much higher than the GMIgcm* 

Phase and amplitude of_ the annual cycle . This quantity reflects seasonal variations in the 
radiative forcing and wave driving. To evaluate these aspects of the circulation independently of 
the mean state, model results are first scaled by the ratio of the observed/model most probable 
annual values. After scaling, the amplitude is judged by whether the model follows the observed 
annual cycle and stays within 25% of the observed values for the year. For this it receives a full 
point, 0.5 point for not exceeding 25% of observed, and 0 for a cycle beyond ±25% of observed. 
The phase is judged by requiring the model to have a minimum and maximum within a month of 
observed, or, when appropriate, for correctly lacking a cycle (1 point). The model receives a half 
point for getting only the minimum or maximum right, and nothing for a phase that bears no 
resemblance the observations. Both simulations did extremely well. 
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Variability and its seasonal cycle . Long-range transport increases CH4 variability while 
mixing and photochemistry decrease it. Variability has its own seasonal cycle independent of the 
seasonal cycle of the most probable value. For example, the midlatitude panels in Figure 3 show 
almost no variation in the most probable CH4 value, yet the breadth of the distribution varies 
greatly between summer and winter. In the midlatitudes in both hemispheres, the pattern of 
CLAES CH4 variability shows a minimum in summer and a broad maximum in fall and winter. 
This indicates that wave activity is strongest in fall and winter and weakest in summer. In the 
Antarctic, the pattern is similar but with smaller variability (weaker wave activity) near the 
vortex, especially at 800K in winter (not shown). In the Arctic, variability is greatest in winter 
and quite low in summer when photochemistry is fast. This semi-quantitative test looks at the 
breadth of the model distribution in each season and looks for agreement with the seasonal cycle 
in variability. The model values are scaled, as before, so that variability is judged independently 
of the annual mean. 

The GMIgcm overall shows a very good cycle of CH4 variability at all latitudes and heights 
tested. Its cycles are good because there is always a minimum in variability in summer (i.e., 
wave-driven transport does not interfere with the reduction of variability by fast photochemistry) 
and a maximum in variability in the appropriate cool season. Its grades are less than perfect in 
the polar regions because the winter wave driving appears to be a little weak. This is consistent 
with the results of the MPV test, where low CH4 at 1200K also pointed to weak wave driving. 
The GMIgcm midlatitude cycle is in very good agreement with CLAES. 

The GMIdas does not consistently show the right cycle of CH4 variability. Like the 
GMIgcm, wave driving in the polar regions is too weak, but it receives lower grades because of 
noisiness in summer, especially in the northern hemisphere. Like the GMIgcm, the GMIdas does 
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best in the midlatitudes, with the exception of 1200K in the north. The minimum variability 
occurs too soon there (end of winter), and then it grows to be very large in spring and summer, 
looking nothing like the observations. The large summer variability seen in the northern 
midlatitudes in the GMIdas could be caused by excessive tropics-to-pole transport. This will be 
evaluated in Section 2. 1.2.3. 

2.1.2.2 Seasonal variations in lower stratospheric N 2 O 

Lower stratospheric profiles of the long-lived tracer N 2 O reflect the balance between the 
diabatic circulation and meridional mixing there. N 2 O measurements in the midlatitude and 
tropics, derived from 1 1 years of ER-2 N 2 O data, are used to create seasonal mean profiles for 
this test [ Strahan etal., 1999]. This test was described in Douglass etal. [1999] (Test 2b) and the 
details of the grading can be found there. Figure 4 shows examples of the model/data agreement. 
The GMIgcm scores higher than best simulation of GMI-1. Both simulations perform acceptably 
in the tropics and northern midlatitudes, and neither has excellent agreement in the southern 
hemisphere (SH). The GMIgcm is closer to observations than the GMIdas- 

2.1.2.3 Tropical isolation in the middle and upper stratosphere 

This is a variation of the original 'Test 3' in Douglass et al. [1999], in which the bimodality 
of CLAES N 2 O pdfs between 10°S and 45°N were used to assess a model’s ability to produce a 
sufficient tropical/midlatitude barrier. The barrier is a steep gradient in potential vorticity that 
arises from midlatitude Rossby wave breaking [Polvani etal., 1995]. The wave breaking, which 
causes mixing in the surf zone, cannot penetrate the tropics, resulting in distinctly different tracer 
mixing ratios in the two regions. Ascent of young air gives high tracer values in the tropics, 
while the older air of the middle and high latitudes has lower mixing ratios and a broad 
distribution. Seasonal variations in wave driving cause variations in the barrier strength, affecting 
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the distinctiveness of tropical and midlatitude air masses. The original test was conducted on 3 
pressure surfaces from 31-7 hPa, while here we use 4 theta surfaces from 600-1200K. A wide 
latitude range is chosen so that QBO phase-dependent variations in the subtropical boundary will 
not affect the modality of the pdf. The phase-dependent secondary circulation set up by the QBO 
also causes significant interannual variability in constituent mixing ratios [Jones and Pyle, 1984] . 
With less than 2 years of CLAES N 2 O data, we compare only the modality of the distribution 
and not the mixing ratios or the absolute separation of the peaks. 

The models are graded every 200K between 600K and 1200K. The CLAES N 2 O pdfs show 
isolation of the tropics at all levels and in all seasons compared. (Spring is excluded because the 
observed N 2 O distribution was nonstationary.) A simulation is granted 1 point for producing 2 
peaks separated by a minimum, even a weak one. A half point is given for a tropical peak with a 
long midlatitude tail instead of a clear mi ni mum, and no points are given if a single, short-tailed 
(i.e., well mixed) peak is found. Figure 5 provides examples of the performance of these 
simulations at 800K. The GMIgcm maintains a tropical/midlatitude separation in all 3 seasons 
(scoring 93%), while the GMIdas makes a clear separation only in winter (scoring 54%). The 
GMIgcm performed equally well at all levels tested, while the GMIdas showed decreasing 
tropical isolation with increasing height. 

2.1.3 Upper troposphere/Lower stratosphere Separation 

This simple test is important because it gauges whether a model has the correct pathway of 
transport from the upper troposphere to lower stratosphere. Referred to as ‘Test 5’ in Douglass et 
al. [1999], it examines the phase lag of the CO 2 seasonal cycle across the tropopause at 60°N. 
Models must show at least a 2-month lag between the CO 2 seasonal cycle maximum on the 
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highest tropospheric and the lowest stratospheric levels. The presence of the lag indicates that air 
from the extratropical upper troposphere does not go directly up into the lower stratosphere, but 
takes a path to the stratosphere via the tropical tropopause [Boering et al., 199 6; Strahan et al., 
1998]. This test is useful for identifying simulations with excessive convective transport. In this 
study, both simulations pass. 

2.2 Specialized tests relevant to ozone simulations 

2.2.1 Spatial and temporal coverage of PSC-producing temperatures 

In regions where heterogeneous chemical reactions occur, the distribution of temperatures is 
more important than the mean. Austin et al. [2003] evaluated model polar temperatures by 
calculating the product of models’ areal and temporal coverage of NAT- and ice-forming 
temperatures in each hemisphere and comparing with that quantity derived from observations. In 
this test we judge a model by whether it can produce a spatially realistic distribution of NAT- 
forming temperatures during the appropriate months. Since polar ozone loss depends on both low 
temperatures and sunlight, this test is designed to look specifically at the latitudes and months 
where PSC-forming temperatures are reached. The 'score' for this test is a description of the 
model’s behavior compared to climatology. 

This test uses 20 years of the NCEP/NCAR reanalysis to calculate a climatological 
temperature distribution for 3 latitude bands in each polar region (70°-90°, 60°-70°, and 50°-60°) 
at 6 pressure levels from 150-10 hPa. Since low temperature bias in the Antarctic stratosphere is 
a longstanding model issue [Pawson et al., 2000], this test is useful not only to determine if a 
model makes enough PSCs deep in the Antarctic vortex, but to assess whether a model forms 
PSCs outside of the expected latitudes, heights, and seasons. This test labels a model’s behavior 
as climatologically normal, somewhat warmer or colder than normal, or unrealistically warmer 
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or colder than normal. 


We calculate the area-weighted fraction of each latitude band that is covered by 
temperatures at or below the PSC frost point (calculated with 4 ppmv H 2 O) on each pressure 
surface in one month. Using 20 years of NCEP/NCAR analyses, the mean and standard deviation 
are calculated for the area-weighted fraction and compared with the same model-derived 
quantities. Model fractional coverage within 1.5 standard deviations (a) of the climatological 
fraction is labeled ‘normal’; fractional coverage between 1.5ct and 3a above (below) the 
observed mean is considered ‘colder (warmer) than normal’, and 3a above (below) the mean is 
‘much colder (wanner) than normal’ and possibly unphysical. 

Figure 6 characterizes polar temperature in the FVGCM and FVDAS from May to 
November. In fall, winter, and spring, both models produce climatologically normal 
temperatures from 50-90°S and 150-10 mb most of the time, but usually some part of the polar 
region is below normal during each month. The FVGCM tends to be a little too cold at 100 hPa 
and below, especially at latitudes near the edge and outside the vortex. Both models are too cold 
inside the vortex near 30 hPa in late winter, which could lead to greater ozone loss there. The 
information presented in this figure provides a context for the interpretation of ozone behavior in 
the Considine et al. [2003] and Connell and Douglass [2003]. 

The same analysis was performed for December through March in the northern hemisphere. 
The Arctic has higher mean temperatures and much larger interannual temperature variability 
than the Antarctic, such that being within 1.5a of the mean can mean having some PSCs or none 
at all. Figure 7 characterizes the Arctic vortex temperature in the models. Much of the FVGCM 
vortex and edge region can be labeled ‘normal’. Temperatures are on the warm side of the mean 
in early winter (no PSCs) while it is quite cold from 70-150 hPa in late winter. However, the 
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absolute effect on PSC areal coverage is small - for example, ‘much colder than normal’ at 100 
hPa in March still means less than 10% coverage, compared to about 2% climatologically. The 
FVDAS temperatures are climatologically normal much of the time, with the exception that 
February and March are very much colder than average at lower levels. The FVDAS assimilated 
wind fields represent the period July 1, 1999-June 30, 2000, reflecting the unusually low lower 
stratospheric temperatures observed in March 2000 [Newman et al., 2002] . 

The particular FVGCM year evaluated here, part of a 35-yr FVGCM integration, was 
selected because it was the most like the ‘SOLVE’ winter. It is interesting to note that this test 
rates the FVGCM winter as ‘normal’. An average FVGCM winter from this multi-year 
integration would probably be rated ‘warmer than normal’. 

2.2.2 Lower stratospheric vortex behavior during breakdown 

The huge ozone loss rates observed in the Antarctic vortex in spring are the result of a 
unique combination of dynamics and chemistry found nowhere else. A model’s ability to 
realistically simulate vortex erosion and mixing processes during breakdown may be crucial to 
its credibility in predicting how declining halogens will affect the depth and the dispersion of 
Antarctic ozone loss. For example, if a model brings midlatitude air into the vortex in early 
spring, this intrusion of non-denitrified air will cause Cl-catalyzed loss processes to shut down 
prematurely. Ozone will not get realistically low inside the vortex, and less ozone-depleted air 
will be dispersed to lower latitudes. This test gauges vortex isolation and exchange between the 
vortex and midlatitudes in spring. 

Methane measurements from HALOE and CLAES are both suitable for developing a mixing 
and isolation diagnostic, but the HALOE data provide a more complete picture. CLAES has 
good spatial coverage down to 80°S for a month at a time, but there are no measurements in 
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October and only 1 austral spring was sampled. CLAES CH 4 data also have large uncertainty at 
450K where ozone loss rates are greatest [Roche et al., 1996; Considine et al., 2003]. While 
HALOE obtains only 15 profiles a day in a hemisphere, it has done so for a decade; low 
interannual variability in the southern hemisphere allows those measurements to be sensibly 
combined to derive a mean dynamical picture of vortex development and breakdown for the 
entire austral spring. The two CH4 data sets can be compared on the 600K surface in September 
and November, where uncertainties are acceptable. In a prototype test, HALOE and CLAES CH4 
pdfs exhibited the same dynamical features, lending confidence to the use of HALOE data, 
which has far less spatial coverage in any single year. 

The test examines the springtime evolution of the CH4 pdfs of two latitude bands, 60-80°S 
and 40-60°S on the 450K and 600K surfaces. Pdfs are derived by binning 8 years of 
measurements from each latitude range for each month of spring. The 60-80°S range is almost 
strictly vortex air in early spring, retaining a small but isolated vortex core into November. The 
40-60°S band is almost strictly midlatitude, or ‘surf zone’, air. The dynamics of vortex 
breakdown and the extent and direction of mixing between the vortex and surf zone are revealed 
by several features of the pdfs: the separation of the peaks, the depth of the minimum between 
the peaks, and changes in the means and most probable values of the peaks during spring. 

The evolution of HALOE CH 4 pdfs, shown in the middle column of Figure 8, reveals the 
process of vortex breakdown in the Antarctic lower stratosphere. HALOE September data show 
two broad but distinct distributions with peaks separated by about 400 ppb; the lack of a deep 
minimum indicates a strong barrier to mixing has not yet developed. By October and November, 
the deep minimum indicates development of a very sharp boundary between the regions. Mixing 
within the vortex also increases during these months as indicated by the narrower vortex 
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distribution. The most probable value decreases, but as CH4 decreases toward the pole, this may 
reflect the higher mean sampling latitude in October (69°S) compared to 64°S in September 
(67°S in November). The most probable value in the vortex declines more than 30 ppb between 
October and November, arguing strongly against any intrusion of surf zone air, which is typically 
500 ppb higher in CH4; even a narrow band of mixing at the vortex edge would result in an 
increase of the vortex mean. Diabatic heating is near zero in spring [Rosenfield et al., 1994; 
Rosenfield and Schoeberl, 2001] and thus descent is not expected to be important. Notice that the 
high CH4 peak of the 60-80°S November distribution is nearly identical to the 40-60°S peak, 
suggesting that air exiting the vortex becomes rapidly mixed into the surf zone. The development 
of this bimodal structure in the 60-80°S band and the endurance of the low CH4 (vortex) peak 
indicate that 1) the vortex breaks down by erosion and, 2) breakdown appears to be a one-way 
process with no evidence of midlatitude air mixing into the vortex. The HALOE data at 600K, 
not shown, give a similar picture of breakdown and show an even stronger barrier to mixing. 

This data set provides a clear picture of vortex breakdown and an excellent basis for model 
evaluation of this process. The left column of Figure 8 shows the breakdown of the GMIgcm 
vortex. The evolution of the model breakdown differs in many ways from the observations. 
While the September distributions have an acceptable 400 ppb separation, that separation 
decreases in the following months, both distributions narrow (no long tails exist at any time), no 
deep and wide minimum forms between the peaks, and the most probable vortex value increases 
by nearly 200 ppb in stark contrast to the 100 ppb decrease in the observations. Vortex evolution 
in the GMIdas simulation is very similar. The separation of the GMIdas distributions is smaller 
to begin with and shrinks rapidly in spring. By November the GMIdas vortex and midlatitude 
distributions are strongly overlapping. 
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In both GMI CTM simulations, vortex evolution is clearly very different from reality. The 
decreasing separation of the peaks and the lack of a deep minimum between them indicate that a 
strong barrier to mixing never forms. The fact that the model vortex most probable value 
increases during spring suggests that midlatitude air is mixing into the vortex. The HALOE pdfs 
clearly show that the vortex breaks down by erosion rather than entrainment of midlatitude air, 
revealing a fundamental difference between modeled and observed vortex behavior. By 
November, the GMIdas CH4 distributions indicate a nearly homogeneous region from 40-80°S, 
in great contrast to the HALOE pdfs that indicate the persistence of a small, well-isolated vortex. 
The large ozone losses observed each October in the Antarctic are possible only because of the 
strict isolation of chemically perturbed air inside the vortex. This unique requirement of both 
chemical processing and isolation cannot be met in a model that lacks a strong barrier to mixing 
at the vortex edge. Poor performance on this test suggests significant consequences for model 
vortex mixing ratios of Cl y , CIO, and NO x , and hence for ozone loss as well. 

The likely reason for the increase in midlatitude HALOE CH4 from 1370 to 1500 ppb during 
spring is wave-driven mixing in the surf zone, which brings high CH4 poleward from low 
latitudes. At the same time, high latitude (vortex) CH} decreases, demonstrating the continued 
isolation of the vortex. The separation of the peaks of the HALOE vortex and midlatitude 
distributions increases during spring at 450K and 600K. In contrast to the HALOE analysis, both 
the GMIgcm and GMIdas show mean CH4 in the vortex increasing rapidly toward surf zone 
values in spring, suggesting a continuous exchange of air between the vortex and midlatitudes; 
the GMIdas has stronger exchange than the GMIgcm- The models’ mean values at 40-60°S are 
also increasing; further analysis showed this distribution merges with the 30-40°S distribution by 
November. The lack of vortex isolation in the simulations compromises their ability to sequester 
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chemically perturbed vortex air. The unphysical transport characteristics shown here may lead to 
significant consequences not only for the depth of ozone depletion, but for the timing of the 
dispersal of ozone-depleted air to midlatitudes during breakdown. In two experiments performed 
with FVGCM meteorological fields, one in another CTM and the other with CH4 transported 
online in the FVGCM, the Antarctic vortex was well isolated. GMI-CTM details such as the 
horizontal resolution and the implementation of the Lin and Rood [1996] numerical transport 
scheme probably contribute to the transport characteristics displayed here. 

Figure 9 compares model and HALOE Arctic pdfs on the 600K surface. There are no 
HALOE high latitude measurements in February. Although the Arctic vortex in March is much 
smaller than the Antarctic in September, the observations show a clear separation between the 
tiny vortex and the midlatitudes. The GMIgcm manages to keep some separation between the air 
masses, but the GMIdas shows a completely homogenized region from 40-80°S by March; the 
observations indicate mixing is still incomplete in April. The GMIgcm has broad distributions in 
February and March while the GMIdas distributions are much narrower, indicating stronger 
horizontal mixing in the GMIdas- In a typical Arctic winter with small O3 losses, excessive 
mixing across the vortex edge will have little impact on the O3 distribution since its horizontal 
gradients are fairly flat in the 500-600K range [Strahan, 2002]. However, should the Arctic 
stratosphere have a cooling trend in the 21 st century with concomitantly larger wintertime O 3 
losses, unrealistic vortex isolation such as shown here may invalidate model predictions. 

3. Grading summary; model credibility 

Most air enters the stratosphere through the tropical tropopause. Both simulations begin this 
journey reasonably well, with good agreement between modeled and observed N 2 O profiles in 
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the tropical lower stratosphere. At 600 K and above, the GMIdas is unable to maintain an isolated 
tropical air mass in summer and fall, unlike the GMIgcm which maintains a distinct tropical air 
mass at all altitudes and seasons in tested. The weak tropical barrier in the GMIdas allows too 
much high N 2 O into the midlatitudes, weakening meridional gradients there and resulting in 
lower stratospheric profiles with high N 2 0 (e.g.. Figure 4). The GMIdas tropical isolation gets 
worse with height. The excessive exchange between the tropics and midlatitudes in the GMIdas 
leads to problems with mid and high latitude tracer distributions and variability. 

The test of CH4 annual cycles in the southern extratropics shows insufficient wave driving in 
the austral fall. Wave driving brings high CH4 from the low latitudes to the polar region. The 
inadequacy of the wave-driven transport is seen in the GMIgcm Antarctic upper stratosphere in 
the form of too little CH 4 , low variability, and almost no CH 4 increase in fall and winter 
compared to CLAES (Figure 3, 1200K 72-80°S). Similar CFL behavior is seen in an FVGCM 
simulation with online chemistry, suggesting the GMI-CTM implementation is not the cause of 
this disagreement with the observations. The GMIdas shows more of this transport occurring. 
When the GMIgcm vortex forms in fall and descent begins in the upper stratosphere, CFLi values 
lower than observed are trapped in the descending vortex. By late winter, CIL in the GMIgcm 
Antarctic vortex is in close agreement with CLAES (600K-1200K), indicating some midlatitude 
air has mixed into the vortex during fall and winter; by spring, model CFL has become too high. 

In Figure 8, showing the separation of vortex and midlatitude air, the vortex most probable 
value rises in the model while the decreasing in the observations. The GMIdas shows the same 
trend but starts with even higher CIL in the vortex. As previously discussed, these test results 
indicate that the lack of isolation allows too much high CFL air from middle latitudes into the 
vortex. However, because an online FVGCM experiment has demonstrated the model’s ability to 


21 


produce nearly realistic vortex isolation year after year, we suspect that the lack of vortex 
isolation in the GMI simulations implicates the CTM implementation and/or horizontal 
resolution. 

The overall result of the middle and high latitude residual circulation tests is that tracer 
transport in the offline GMI G cm has greater fidelity throughout the stratosphere than the GMI DA s. 
The GMIgcm has greater realism in its northern hemisphere than its southern hemisphere and it 
performs best in the middle stratosphere. The southern hemisphere upper stratosphere is where 
the GMIgcm has the worst comparison with observations and where the GMIdas scores 
considerably higher. The GMIdas performs best in the midlatitudes, north and south, but 
struggles with the southern hemisphere lower stratosphere. In the polar lower stratosphere, 
neither simulation is able develop an impermeable vortex edge. Table 2 summarizes residual 
circulation grading as a function of height and hemisphere, and rates barrier formation ability. 

Temperature-dependent gas phase reactions are likely to be carried out at the right rates in 
both simulations. At 30 hPa and below, each model achieves climatologically normal 
temperatures at nearly all latitudes and seasons. Higher up, both models are biased slighdy low, 
but near the stratopause both models are a few degrees too high. Overall, the FVDAS is always 
closer to NCEP 20-yr climatological temperatures than the FVGCM at all levels in the 
stratosphere. 

Both simulations do a good job of producing near climatological temperatures for the 
production Antarctic PSCs, including realistic areal coverage as a function of month, latitude, 
and altitude. The region of FVGCM PSC-producing temperatures extends a little too far from the 
pole, particularly at 100 hPa and below. The FVDAS Anatarctic lower stratosphere warms a little 
later in spring than the climatological average. In the Arctic, the FVGCM was climatologically 
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near normal at 50 hPa and above, but cold in the late winter vortex at 70 hPa and below. The 
FVDAS was near normal in much of the Arctic, but much below normal in March outside the 
vortex near 70 hPa, reflecting the cold stratospheric spring of 2000. 

Overall, the GMIgcm does a better job of barrier formation, particularly in the tropics though 
only moderately so in the polar regions. This is consistent the model mean age of air comparison 
showing the GMIgcm to have older air in the polar lower stratosphere than the GMIdas 
[Considine et al., 2003]. The primary weakness in the GMIgcm appears in the southern 
hemisphere spring and fall, the seasons when the greatest wave activity should occur [Randel, 
1988]. TTie GMIdas does a better job there, suggesting that the insertion of observations in the 
DAS may improve GCM deficiencies in this region. 

4. Conclusions 

No model can faithfully represent all known atmospheric processes, but by understanding 
both the skills and the deficiencies of a model, one can determine its the best use. Transport and 
chemistry influence the distribution of ozone at all altitudes and latitudes of the stratosphere, 
requiring a model to perform well just about everywhere. To study the effect of changing 
chlorine levels on stratospheric ozone, a model requires additional testing in regions where 
chlorine plays a significant role in ozone loss (i.e., the upper stratosphere and the polar lower 
stratosphere). This reflects the philosophy of evaluation used here. 

These evaluations provide insight into the usefulness of offline chemistry and transport 
simulations using the FVGCM and FVDAS meteorological fields. The quality of these 
simulations is affected not only by the input meteorological fields, but by the offline model 
advection scheme, resolution of the CTM, and the implementation of various CTM components 
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(e.g., the chemical mechanism). Differences in the time step of the chemical mechanism and the 
advection scheme will lead to interactions between these modules, especially for diumally- 
varying species near the terminator. This leads to inherent differences in performance between 
offline and online chemistry. Experiments performed at 2°x2.5° horizontal resolution will not 
give the same results as a 4°x5° experiment, especially for meridional tracer gradients and for the 
CH 4 vortex mixing diagnostic. Experiments with online parameterized CH 4 chemistry in the 
FVGCM revealed that tracer transport is less diffusive and more realistic online than with the 
same meteorological fields in the offline model. Using the model diagnostics shown here, 
sensitivity of results to resolution and implementation choices can be tested, allowing the user to 
select simulations with the greatest fidelity to physical processes. Objective evaluation of model 
processes, using diagnostics such as these present here and in Douglass et al. [1999], provides a 
way to reduce uncertainty in model calculations. 
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Figure Captions 

Figure 1 . The difference between the models’ and the 20-yr NCEP climatological most probable 
temperatures on 2 surfaces. Most probable values are calculated monthly for 1 1 latitude bands. 
Contour intervals are 3K. At 50 hPa, both simulations are usually within 0-3K of the 
climatological value. The top panels show a cold bias in both simulations in the upper 
stratosphere (5 hPa). The FVDAS bias is smaller (mostly -3 to - 6 K), while the FVGCM bias is 
frequently -6 to -9K, with variable bias at high latitudes. 

Figure 2. Summary of model temperature behavior with respect to the 20-yr NCEP climatology. 
Each histogram gives the area-weighted difference between model and climatological most 
probable temperatures (MPTs) on a given pressure surface for each latitude band and month of 
the year. (Each histogram contains 132 MPT comparisons.) 

Figure 3. Comparison of CLAES CEL extratropical annual cycles with GMIgcm and GMIdas for 
4 latitude bands on the 1200K surface. The annual cycles are produced from contours of daily 
CH 4 pdfs. Yellow and red indicate a high probability of that mixing ratio, indicating a well- 
mixed distribution. Blue and purple, which represent low probability, are usually part of broad 
distributions. Broad distributions arise when long range transport dominates processes which 
reduce variability, namely, rapid photochemistry and mixing. 

Figure 4. Comparison of seasonal mean N 2 O profiles calculated from aircraft data with model 
profiles in the lower stratosphere over 3 latitude ranges and two seasons. Both models are 
consistently higher than the observations at and above 420K, with the GMIgcm usually lying 
closer to the observations than the GMIdas- 

Figure 5. Separation of tropical and midlatitude air masses illustrated by CLAES N 2 O pdfs and 
model comparisons on the 800K surface. While both models maintain separation in winter at all 
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levels examined, only the GMIgcm consistently keeps a clear separation in summer and fall at all 
levels. 

Figure 6 . Distribution of normal, below normal (1.5-3o), and much below normal (>3a) model 
temperatures in the Antarctic during the cold seasons. Both models have large areas of 
climatologically normal temperatures. 

Figure 7. Same as Figure 6 , except for the Arctic winter. Although the FVDAS is unusually cold 
in March, this accurately represents Arctic temperatures that year ( 2000 ). Overall, the FVGCM is 
warmer in the Arctic and makes fewer PSCs than the FVDAS, but because of the large 
variability there, both models are categorized as climatologically normal. 

Figure 8 . Evolution of CH 4 distributions on the 450K surface inside and outside the Antarctic 
vortex in spring. The central column, representing an 8 -yr accumulation of HALOE observations 
in austral spring, demonstrates that the vortex air mass maintains its identity while gradually 
eroding. The GMIgcm simulation (left column) maintains some separation through the spring, 
but large overlap between the distributions indicates exchange between the regions - in contrast 
to the observations. The GMIdas simulation (right column) does a worse job of maintaining 
separation, and by November the vortex and midlatitudes are nearly identical (i.e., well mixed). 
Figure 9. Evolution of CH4 distributions on the 600K surface inside and outside the Arctic 
vortex in spring. The GMIgcm distributions (left column) show good separation in February and 
March, with near total mixing by April. HALOE data, accumulated over 8 years (center column), 
show a small, distinct vortex in April. The GMIdas simulation (right column) cannot keep the 
regions distinct even in February. A substantial vortex existed all winter in 2000 [Newman eta]., 
2002], but the GMIdas vortex and midlatitudes are indistinct in this simulation. 
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Table 1. Residual Circulation Test Results (from Section 2.1.2) 
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Table 2. Summary of Transport Performance 
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